Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Vision language models can now generate long-form answers to questions about images -- long-form visual question answers (LFVQA). We contribute VizWiz-LF, a dataset of long-form answers to visual questions posed by blind and low vision (BLV) users. VizWiz-LF contains 4.2k long-form answers to 600 visual questions, collected from human expert describers and six VQA models. We develop and annotate functional roles of sentences of LFVQA and demonstrate that long-form answers contain information beyond the question answer such as explanations and suggestions. We further conduct automatic and human evaluations with BLV and sighted people to evaluate long-form answers. BLV people perceive both human-written and generated long-form answers to be plausible, but generated answers often hallucinate incorrect visual details, especially for unanswerable visual questions (e.g., blurry or irrelevant images). To reduce hallucinations, we evaluate the ability of VQA models to abstain from answering unanswerable questions across multiple prompting strategies.more » « less
-
We study how to optimize the latent space of neural shape generators that map latent codes to 3D deformable shapes. The key focus is to look at a deformable shape generator from a differential geometry perspective. We define a Riemannian metric based on as-rigid-as-possible and as-conformal-as-possible deformation energies. Under this metric, we study two desired properties of the latent space: 1) straight-line interpolations in latent codes follow geodesic curves; 2) latent codes disentangle pose and shape variations at different scales. Strictly enforcing the geometric interpolation property, however, only applies if the metric matrix is a constant. We show how to achieve this property approximately by enforcing that geodesic interpolations are axis-aligned, i.e., interpolations along coordinate axis follow geodesic curves. In addition, we introduce a novel approach that decouples pose and shape variations via generalized eigendecomposition. We also study efficient regularization terms for learning deformable shape generators, e.g., that promote smooth interpolations. Experimental results on benchmark datasets show that our approach leads to interpretable latent codes, improves the generalizability of synthetic shapes, and enhances performance in geodesic interpolation and geodesic shooting.more » « less
-
null (Ed.)Augmentative and alternative communication (AAC) devices enable speech-based communication. However, AAC devices do not support nonverbal communication, which allows people to take turns, regulate conversation dynamics, and express intentions. Nonverbal communication requires motion, which is often challenging for AAC users to produce due to motor constraints. In this work, we explore how socially assistive robots, framed as ''sidekicks,'' might provide augmented communicators (ACs) with a nonverbal channel of communication to support their conversational goals. We developed and conducted an accessible co-design workshop that involved two ACs, their caregivers, and three motion experts. We identified goals for conversational support, co-designed prototypes depicting possible sidekick forms, and enacted different sidekick motions and behaviors to achieve speakers' goals. We contribute guidelines for designing sidekicks that support ACs according to three key parameters: attention, precision, and timing. We show how these parameters manifest in appearance and behavior and how they can guide future designs for augmented nonverbal communication.more » « less
An official website of the United States government

Full Text Available